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IMPROVED PERFORMANCE FIR FILTER 
BACKGROUND OF THE INVENTION 

Technical Field of the Invention 

The present invention relates to FIR filters and in particular FIR filters with look-up tables. 
Description of the Related Art 

A classical Finite Impulse Response (FIR) filter multiplies input data by a coefficient. The 
FIR filter then accumulates the multiplied results together to produce an output. Generally, FIR 
filters are used to perform filtering function for digital signals. Such functions include digital signal 
highpass, lowpass, bandpass and notch filtering. The most likely uses for FIR filters are in digital 
audio processing. Digital audio processing is used in a variety of well known devices including 
radio; compact disk (CD) players for music and video; digital telephones including cellular, wireless 
and hard wired; digital video recording and playing equipment including computer, video disc 
players, video camera recorders, and cameras. 
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A FIR filter generally operates in the time domain by multiplying FIR coefficients with bits 
from the digital audio signal. The results of the multiplications are added or accumulated thereby 
producing the desired digitally filtered audio output. 

In the recent past, single bit FIR filters have become more popular. Single bit FIR filters are 
5 being used a great deal in audio digital signal processing equipment because the use of a single bit 
data stream is being used with CD's and music streaming (to name a few). Another specific area that 
single bit data stream FIR filtering is being used in are direct stream transfer-direct stream digital 
(DST-DSD) decoding of super audio compact disc (SACD) material. 
SUMMARY OF THE INVENTION 
1 0 Embodiments of an exemplary FER filter can perform or operate at the same level or higher 

than prior FIR FILTERS. Also, exemplary FIR filters may be less costly to manufacture than prior 
single bit FIR filters. 

In many cases, an exemplary FIR filter exceeds prior FIR filters in performance using 
substantially the same generation of technology. These improvements are generally achieved by 

1 5 using a portion of the input data stream to the FIR filter as a memory address. The memory address 
location may store the multiplication and accumulation results for the portion of input data. The 
multiplication and accumulation results are accumulated, if necessary, with the multiplication and 
accumulation results of other portions of input data. Thus, the real time, physical multiplication of 
input data bits by coefficients do not need to be performed for each input bit by an exemplary FIR 

20 filter because of a look-up table of answers for portions of input data. In some exemplary FIR filters 
only addition functions will have to be performed to produce a FIR filter output. 
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BRIEF DESCRIPTION OF THE DRAWINGS 

A more complete understanding of the system and method of the present invention may be 
obtained by reference to the following Detailed Description when taken in conjunction with the 
accompanying Drawings wherein: 
5 FIGURE 1 is a block diagram of a standard single bit FIR filter; 

FIGURE 2 is an exemplary block diagram of a single or multiple bit FIR filter in accordance 
with the present invention; 

FIGURE 3 is a flow chart of an exemplary technique for loading the memory of an exemplary 
FIR filter; and 

10 FIGURE 4 is an exemplary flow diagram of an exemplary FIR filter. 

DETAILED DESCRIPTION OF THE EXEMPLARY EMBODIMENTS 

Exemplary embodiments of the present invention will now be described more fully 
hereinafter with reference to the accompanying drawings in which exemplary embodiments of the 
1 5 invention are shown. The invention, however, may be embodied in many different forms and should 
not be construed as being limited to the embodiments set forth herein. Rather, the embodiments are 
provided so that this disclosure will be thorough and complete, and will fully convey the scope of the 
invention to those skilled in the art. 

Referring to FIGURE 1, a standard FIR filter 10 is depicted in block diagram form. The 
20 standard FIR filter 1 0 is a single bit FIR filter. A single bit FIR filter is a FIR filter wherein the data 
input into the FIR filter 10 is a serial stream of data that is one bit wide. The digital data enters the 
input 12 of the standard FIR filter 10 and enters a delay line, serial latch shift resistor or derivation 
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thereof 14. The bits of data are shifted or travel down the delay line 14 such that each bit is provided 
as Xo, Xu %2 • • In- Each bit (xo, Xu X2, . . • Xn) is multiplied via the multiplier 16 with an associated 
coefficient (Co, Ci, C2, . . . C n ), which may be stored in a memory, in a table, in a latch or in another 
electronic storage means such as a flash memory, disk memory, RAM, ROM or derivations thereof. 
The output 1 8 of each multiplier 1 6 is then accumulated or added in the adder 20 in order to produce 
an output 22 of the standard single bit FIR filter 10. The coefficients C 0 , C u C 2 , . . . C n can be single 
or multiple bits wide thereby making the multiplier's output 1 8 one or more bits wide. As such, the 
multiple multiplication processes ultimately consumes microprocessor or math coprocessor time 
regardless of whether the standard prior art single bit FIR filter is implemented in hardware, 
software, or a combination of both. 

Still referring to FIGURE 1, if the prior art single bit FIR filter 10 is a 16 tap filter, meaning, 
for example, the filter handles two 8 bit bytes of data at a time for xo, Xu %2, • . X15, then to provide 
each output 22, as the data shifts through the input nodes or registers, requires 16 multiplies and 16 
addition functions. One can see that processing data through a prior art single bit FIR filter is math 
intensive even for a relatively small number of bits. For this example of a 16 tap, single bit FIR 
filter, significant processing time is required to perform all of the 16 multiply functions and the 16 
add functions. 

Referring now to FIGURE 2, an exemplary single bit FIR filter 40 is depicted in block 
diagram form. The single bit FIR filter 40 has an input 42. A single bit stream of input data enters 
and is shifted through the input to become input xo, then xo and Xi > then xo, Xi > and X2 and so on. As 
the input bits are shifted through nodes or shift registers Xo - Xi • • ■ Xn, they can be used as a memory 
address or with a memory address offset 44. 
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At each shift of the xo - Xn bits, one or more memory locations are addressed. The memory 
locations may be found in a RAM, ROM, FLASH or other types of memories. Within each 
address's memory location 46 is a pre-computed multiplication (by the necessary coefficients) and 
summation result 48 for each of the possible variations of the first 8 bits of shifted input data. At this 
5 point, if the single bit FIR filter is an 8 tap filter the output of T 0 48 would have been performed 
without a single mathematical calculation because the memory location provides a precalculated 
correct output T 0 . Conversely, if an 8 tap (xo - It) single bit FIR filter according to FIGURE 1 was 
used the prior art standard FIR filter 10 would have to perform 8 multiplication and 8 addition 
functions to provide the output 22. 
1 0 Referring back to FIGURE 2, when the single bit FIR filter has more input nodes or taps than 

can be used to address a single memory address, then a second, third, fourth, and so on address offset 
44' is provided to address additional memory locations in order to provide pre-computed 
combinations of coefficient values for multiplication and summation results 46' for the additional 
nodes. 

1 5 The outputs from the memory locations of each of the pre-computed values 48 and each 48 ' 

are provided to a summer or accumulator 50 that performs addition functions of two or more of the 
pre-computed values 48 and each 48'. The output of the accumulator 50 is the output 52 of the 
exemplary single bit FIR filter 40. 

Revisiting FIGURE 2 from a slightly different perspective, the data enters the exemplary 

20 single bit FIR 50 at the input 42. The entering bits may be either a 1 or a 0. To start with, suppose 

the first bit is a 0 in xo, then the remaining nodes Xi - %i (the first byte) would also be zero because 

no data has been shifted to them yet. Thus, the first 8-bit byte is made of all zeros including the first 
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bit in xo of the input. The first byte of all zeros is received by the address offset 44, the address 
offset 44 may add a predetermined number to the first byte so that a memory location that is greater 
or equal to the address offset 44 is addressed. The first byte plus the address offset become an 
address used to pull a pre-computed result from memory. The pre-computed result is equal to the 
5 same result a standard prior art FIR filter would have calculated for the same input bits multiplied by 
the same coefficients and then added together. Each of the pre-computed results retrieved from 
memory 48, and each 48 ' are then provided to the accumulator 50 wherein they are added together to 
provide an output 52 of the FIR filter. 

The process is repeated again. The first bit moves to the next data position and the new, 
10 second bit moves to the xo position. These two bits xo and xi plus all of the old bits (which would be 
zero initially) establish addresses via the address offset 44 and 44' to retrieve the pre-computed 
multiplication/summation values 48, 48' out of the memory or table 46. The retrieved pre-computed 
multiplication/summation values are provided to the accumulator 50 to thereby produce the second 
output 52. 

1 5 This process continues to provide a third and fourth output and all the following outputs each 

time the input data bits shift to another node in the input 42. The input 42 can be a delay line, a shift 
register or an array in software or any reasonable derivation thereof. There does not have to be a 
hardware input register 42 per se, but it could be called an input register or delay line without a 
specific hardware implementation. In fact, an exemplary single-bit FIR filter in accordance with the 
20 present invention could be created and be operational completely in software. 

One aspect of an exemplary embodiment is pre-computing the possible results for the 
multiplication and addition portions of a multiple tap, single bit FIR filter. 
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More specifically, by taking advantage of the fact that' the input is binary and that the data is a 
single bit wide, pre-computed tables 46, 46' can be easily constructed for each possible input (0 or 1) 
that is multiplied by each coefficient and added with, for example, similar results from seven other 
input bits. The result would be that, for each possible combination of 8 input data bits (i.e., a byte of 
5 input data bits) there would be a table entry providing a result for the FIR filter's multiplication and 
addition steps for those 8 bits. Each table entry has an address. The address of each table entry can 
be related either directly to the 8 input data bits or indirectly by an offset number being added to or, 
subtracted from, or attached as a prefix or suffix to the 8 bit binary number formed by the input bits. 
Referring to FIGURE 3, an exemplary method of loading the tables or memory is provided. 

10 The tables can be stored in read only memory (ROM) if the coefficients for the FIR filter are to 
remain constant. Random access memory (RAM) or flash memory may be utilized for constant 
coefficients as well as in situations when the coefficients require changes, recalculation, upgrades or 
fine tuning in order to meet desired specifications. 

At step 60 of FIGURE 3 a routine for loading tables into memory begins. Entry into this 

1 5 routine may occur in the factory or each time an exemplary single bit FIR filter is powered on. An 
exemplary single bit FIR filter has n nodes wherein a single bit is read at each node. For example, if 
the single bit FER filter is 32 bits long, then 32 bits are read from the nodes each time the input data 
shifts one bit. The nodes may be divided up into 4 sets of 8 nodes. Each set could be considered an 
8 bit byte of data. Each bit of each byte is multiplied by a coefficient. The results of each 

20 calculation would then be added or accumulated in order to determine the filtered response for the 8 
bit byte. 
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Again, using, for example, a 32 bit input single bit FIR filter, at step 62 each possible bit (0 or 
1) is multiplied by the coefficient for the specific node. Each possible 8 bit combination is stepped 
through (00000000, 00000001, 00000010, 00000011 and so on through 11111111) so that the 
multiplication and accumulation calculations are performed for the 32 possible bit combinations for 
the first 8 bit byte. The result of each calculation for each of the 32 possible combinations may be 
stored in a table or a memory location having the memory address of the 8 bit byte with or without an 
offset. The same process is done for the second, third and fourth sets of 8 bit bytes. In the end, a 
table or memory map is created containing the output for each possible eight bit input to the four sets 
of eight nodes. 

At step 64, the result for each possible byte combination is stored in an associated memory 
location. For example, a byte of the input data bits for the first set (xo~X7) m ay be 10101111. The 
corresponding result of each multiplication and accumulation process for each node's bits 
1,0,1,0,1,1,1,1 may be stored in memory location 10101111. A second byte of data bits for the 
second set of 8 bit data fas - %\s) will also be 10101111, but the result of the multiplication and 
accumulation cannot be stored in the same memory location as the first set of 8 bit data byte for the 
first set (xo - Xs). Thus, an address offset may be added to the second set of 8 bit data (xs - Xis) 
1 0101 1 1 1 so that the result is stored in a memory location that is different than the address 10101111 
of the first set of 8 bit data. The results for all the 4 sets of 8 bit bytes of data can be stored in the 
memory locations in a similar manner. 

It is understood that an embodiment of the present invention can use substantially any 
number of bits as a table location or a memory address. For example, if an exemplary single bit FIR 
filter is 128 bits long, then the data sets may be one 128 bit address (this may be impractical 
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presently but is possible), two 64 bit sets, four 32 bit sets, eight 16 bit sets, 16 eight bit sets or 32 
four bit sets. It is understood also that each set does not necessarily have to be the same number of 
bits as the other sets of input data bits. 

A mathematical technique for determining each pre-calculated result is best understood by 



5 reviewing the following example wherein a set or table has a width of four data bits. In other words, 
for an eight bit word there are two, four bit bytes that are used to address the pre-computed tables 46, 
46', each table having a total table width, w, of four. For example: 





SET=0 


SET=1 


SET=m 


Input Data Bits A 


X0X1X2X3 


X4X5X6X7 


X8...Xn 


Input Data BitSB 


Xoo X01 X02 X03 


X10X11 X12X13 


Xml . . . Xmw 










Coeficients A 


Co Ci C2 C3 


C4 C5 C6 C7 


Cg . . . c n 


Coefficients 


Coo C01 C02 C03 


C10C11 C12C13 


C m l C m 2 . . . Cmw 



wherein: w = 4(0, 1 , 2, 3); the number of sets m = 2 (two, four bit sets); the number of bits that can 
be input, during each data shift, into the FIR filter = n. Input Data Bit A represent input bits into a 



1 0 prior art single bit FIR filter. Input Data Bits B represent the same Input Data Bits A , but the subscripts 
for the data are m, w (set number, bit width number). The Coefficients A represent the coefficients 0 
through n that will each be multiplied by Input Data Bits A (Xo - Xn)- Coefficients B represent the 
Coefficients, C mw that will be multipled by Input Data Bits B (Xmw), respectively. For a prior art, 

single input FIR filter a summation of each multiplication between Xn and C n is performed and can 
1 5 be mathematically described as: 
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output = X* n C n 



(1) 



n=0 



This equation (1) requires n multipliers and n adds thereby consuming microprocessor or 
arithmetic processor time. 

In an exemplary embodiment wherein the table width, w, is four, a table for Set m =0 would 
look like: 



for set = 0 



Set = 0 


TABLE INDEX 


TABLE VALUE 


0000 


TV 0I 


0001 


TV 02 


0010 


TV 03 


0011 


TV 04 


0100 


TV 0S 














1111 


TV 0> , 



for set = M 
and W=4 



SET = m 


TABLE INDEX 


TABLE VALUE 


0000 


TV™,, 


0001 




0010 




0011 




0100 
















1111 


TV* 



wherein w = the set or table width; I = the table index; TV is the table value, V = w 2 = the 

number of possible table values; and I is the table index for set m. 

Thus, for set m = 1, w = 4 and the table index I = 0010, then TV M , i is equal to: 

10 
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TV 12 = 0 * C 7 + 1 * Q + 0 * C s + 0 * C 4 = C 5 = C„ 
Thus, for each TV in a set m having a width w 

TV. j = Z Index Value bit{w) * C m , w 
The resulting single bit FIR filter output is thus equal to: 

m 

5 output =YJV h j 

m=0 

In other words, an exemplary FIR filter's output will be the summation of each table 
value, TV, associated with the set of bits, m, that have shifted into the FIR filter's inputs. If the 
exemplary single bit FIR filter is 16 bits long and the input of the filter is separated into four sets, 
m, of four bits, w, then there will only be 4 look-ups into the tables and four accumulation 

10 functions to produce the output. This exemplary processes takes significantly less time than prior 
art single bit FIR filters performing, 16 multiply and 16 accumulation steps, because less 
processing time is required. 

In an embodiment with eight 16 bit sets, there will be eight memory lookups and eight 
accumulation steps to provide each output of an exemplary single bit FIR having a 1 28 input bit feed. 

1 5 The pre-calculation of the 1 6 coefficient multiplications and 1 6 accumulations for each set (either 
stored in ROM or performed at startup and stored in RAM or other program memory) greatly 
decreases the arithmetic load on a microprocessor or co-processor in the system because all the 
multiplication steps have been completed and a majority of the addition steps have also been 
completed. 
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As the input set size gets larger the amount of storage or memory required for the pre- 
calculated results increases. For each 1 6 bit input set about 64K of memory is required. Thus, a 1 28 
bit input for the single bit FIR with eight 16 bit sets would require about 512K of memory. 

It is understood by those of ordinary skill in the art that there may be some memory storage 
5 shortcuts or "tricks" that can be utilized with embodiments of the present invention that do not 
deviate from the spirit of the invention. For example, when there is a repetitious organization of 
coefficients or where a coefficient zero are not used in a multiplication calculation to create a stored 
value (because the multiplication value will equal 0). Other creative techniques for decreasing the 
actual amount of memory required to store the pre-calculated data are also available and would be 

1 0 known to one of ordinary skill in the art. For example, when different input bit combinations to the 
FIR filter mathematically equal the same stored values such that the same memory address is used 
for both data input combinations. 

Exemplary embodiments of single bit FIR filters can be executed strictly in software running 
on a computer. The software can be stored in the computer such that a microprocessor and related 

15 electronic devices perform the steps prescribed by the software. FIGURE 4 depicts a flow diagram 
for an exemplary embodiment as it may function in software, firmware or hardware or a combination 
thereof. At step 70 the coefficients are multiplied with the possible sets of bits in the input and the 
answers are accumulated and stored in proper memory locations as discussed with respect to 
FIGURE 3. The memory locations coincide with the pre-computed memory portions 46, 46' in 

20 FIGURE 2. The pre-calculated combinations of coefficients 46, 46' can be calculated in step 70 
during manufacturing of a device or prior to each operation of an exemplary single bit FIR filter. 

12 
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At step 72, a first bit from the serial bit stream or other data source is shifted into the first 
input position node, shift register, variable register or temporary register of Set 0. The input nodes of 
set 0 (which at this point will be mostly zeros because only one bit has been shifted in) are used 
either with or without the address offset function 44 to address a memory location in the pre- 
5 computed combinations of coefficients 46 memory. The memory location provides the precalculated 
result of a single bit FIR filter having the same input profile as the bits in set 0. 

At step 74, the contents of the set 0 addressed memory location is accumulated with the 
contents of the memory locations associated with the input bits found in sets 1 through m. (In this 
case so far, they are all zero bits). In step 76, the accumulation result is provided as the first output 
1 0 of the exemplary single bit FIR filter. 

At step 78, the next bit is shifted into the input such that the first data bit and all other bits are 
shifted one or more positions. At step 72, the reading of the input bits over the length of the filter is 
performed again and pre-computed values for each data set are retrieved. The loop of steps 72, 74, 
76 and 78 are repeated as long as data bits are flowing into the FIR filter software program, hardware 
1 5 or combination thereof. 

In another exemplary embodiment, depicted in FIGURE 2, instead of a single bit FIR filter, a 
two bit FIR filter is shown using input 42'. Here a two bit wide stream of data is shifted into the 
input 42' as xoi > X02, %\u X12, X3i, X32 • ■ . Xm, Xn2...Xnd. (Wherein n = the bit number and d = the bit 
width of the input.) 

20 In a similar manner as in the other exemplary embodiments, a pre-calculated table is made for 

the coefficient-multiplication and accumulation of each of the 2 bit width data. The results are stored 
in memory locations having addresses related to the input data bits. 
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When the 2 bit FIR filter is operational, the two bit data entries shift down the input 42' and 
the appropriate multiplication and accumulation results are fetched from memory. The fetched 
results are accumulated 50 to produce an output 52 for each shift movement of data. It is understood 
that embodiments having more than a two bit wide input stream are possible so long as enough 
5 memory is available for the pre-calculated multiplication and accumulation functions. 

The previous description is of a preferred embodiment for implementing the invention, and 
the scope of the invention should not necessarily be limited by this description. The scope of the 
present invention is instead defined by the following claims. 
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